import dalex as dx
import pandas as pd
import numpy as np
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
import warnings
warnings.filterwarnings('ignore')
dx.__version__
First, divide the data into variables X and a target variable y.
data = dx.datasets.load_titanic()
X = data.drop(columns='survived')
y = data.survived
data.head(10)
numerical_transformer pipeline:
numerical_features: choose numerical features to transformcategorical_transformer pipeline:
categorical_features: choose categorical features to transform 'missing' stringaggregate those two pipelines into a preprocessor using ColumnTransformer
classifier model using MLPClassifier - it has 3 hidden layers with sizes 150, 100, 50 respectivelyclf pipeline model, which combines the preprocessor with the basic classifier model numerical_features = ['age', 'fare', 'sibsp', 'parch']
numerical_transformer = Pipeline(
steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())
]
)
categorical_features = ['gender', 'class', 'embarked']
categorical_transformer = Pipeline(
steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))
]
)
preprocessor = ColumnTransformer(
transformers=[
('num', numerical_transformer, numerical_features),
('cat', categorical_transformer, categorical_features)
]
)
classifier = MLPClassifier(hidden_layer_sizes=(150,100,50), max_iter=500, random_state=0)
clf = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', classifier)])
clf.fit(X, y)
exp = dx.Explainer(clf, X, y)
Above functions are accessible from the Explainer object through its methods.
Each of them returns a new unique object that contains a result field in the form of a pandas.DataFrame and a plot method.
This function is nothing but normal model prediction, however it uses Explainer interface.
Let's create two example persons for this tutorial.
john = pd.DataFrame({'gender': ['male'],
'age': [25],
'class': ['1st'],
'embarked': ['Southampton'],
'fare': [72],
'sibsp': [0],
'parch': 0},
index = ['John'])
mary = pd.DataFrame({'gender': ['female'],
'age': [35],
'class': ['3rd'],
'embarked': ['Cherbourg'],
'fare': [25],
'sibsp': [0],
'parch': [0]},
index = ['Mary'])
You can make a prediction on many samples at the same time
exp.predict(X)[0:10]
As well as on only one instance. However, the only accepted format is pandas.DataFrame.
Prediction of survival for John.
exp.predict(john)
Prediction of survival for Mary.
exp.predict(mary)
'break_down'
'break_down_interactions'
'shap'
This function calculates Variable Attributions as Break Down, iBreakDown or Shapley Values explanations.
Model prediction is decomposed into parts that are attributed for particular variables.
bd_john = exp.predict_parts(john, type='break_down')
bd_interactions_john = exp.predict_parts(john, type='break_down_interactions')
sh_mary = exp.predict_parts(mary, type='shap', B = 10)
bd_john.result.label = "John"
bd_interactions_john.result.label = "John+"
bd_john.result
bd_john.plot(bd_interactions_john)
sh_mary.result.label = "Mary"
sh_mary.result.loc[sh_mary.result.B == 0, ]
sh_mary.plot(bar_width = 16)
sh_john = exp.predict_parts(john, type='shap', B = 10)
sh_john.result.label = "John"
sh_john.plot(max_vars=5)
'ceteris_paribus'This function computes individual profiles aka Ceteris Paribus Profiles.
cp_mary = exp.predict_profile(mary)
cp_john = exp.predict_profile(john)
cp_mary.result.head()
cp_mary.plot(cp_john)
cp_john.plot(cp_mary, variable_type = "categorical")
'classification'
'regression'
This function calculates various Model Performance measures:
mp = exp.model_performance(model_type = 'classification')
mp.result
mp.result.auc[0]
mp.plot()
'variable_importance'
'ratio'
'difference'
This function calculates Variable Importance.
vi = exp.model_parts()
vi.result
vi.plot(max_vars=5)
There is also a possibility of calculating variable importance of group of variables.
vi_grouped = exp.model_parts(variable_groups={'personal': ['gender', 'age', 'sibsp', 'parch'],
'wealth': ['class', 'fare']})
vi_grouped.result
vi_grouped.plot()
'partial'
'accumulated'
This function calculates explanations that explore model response as a function of selected variables.
The explanations can be calulated as Partial Dependence Profile or Accumulated Local Dependence Profile.
pdp_num = exp.model_profile(type = 'partial')
pdp_num.result["_label_"] = 'pdp'
ale_num = exp.model_profile(type = 'accumulated')
ale_num.result["_label_"] = 'ale'
pdp_num.plot(ale_num)
pdp_cat = exp.model_profile(type = 'partial', variable_type='categorical', variables = ["gender","class"])
pdp_cat.result.loc[:, '_label_'] = 'pdp'
ale_cat = exp.model_profile(type = 'accumulated', variable_type='categorical', variables = ["gender","class"])
ale_cat.result.loc[:, '_label_'] = 'ale'
ale_cat.plot(pdp_cat)
Hover over all of the above plots for tooltips with more information.
You can easily save an explainer to the pickle (or generaly binary form) and load it again. Any local or lambda function in the explainer will be dropped during saving. Residual function by default is local, thus, if default, it is always dropped. Default functions can be retrieved during loading.
# this converts explainer to a binary form
# exp.dumps()
# this load explainer again
# dx.Explainer.loads(pickled)
# this will not retrieve default function if dropped
# dx.Explainer.loads(pickled, use_defaults=False)
# this will save your explainer to the file
# with open('explainer.pkl', 'wb') as fd:
# exp.dump(fd)
# this will load your explainer from the file
# with open('explainer.pkl', 'rb') as fd:
# dx.Explainer.load(fd)
This package uses plotly to render the plots:
plotly in JupyterLab: Getting Started Troubleshootingshow=False parameter in plot method to return plotly Figure objectdalex package: Titanic: tutorial and examples